Comments for MEDB 5510, Week 09

Topics to be covered

  • What you will learn
    • Internal validity
    • External validity
    • Measurement validity
    • Five case studies
    • Three dichotomies of measurement
    • Dichotomy examples
    • Split half reliability
    • KR-20 and Cronbach’s alpha

Three types of validity

  • Internal validity
    • “The extent to which we can infer that the independent variable caused the dependent variable.”
  • External validity
    • “The extent to which the findings will generalize to other populations, settings, measures, and treatments.”
  • Measurement validity
    • “The quality of accuracy of individual measures or scores. The extent to which a score measures what it was intended to measure.”
    • Also measurement reliability

Internal Validity

  • “The extent to which we can infer that the independent variable caused the dependent variable.”
  • Three criteria for causality
    • IV must precede the outcome variable
    • IV must be related to the outcome
    • There must be no other variables that could explain why the IV is related to the outcome

Establishing internal validity by the research approach

  • Hierachy
    • Randomized Experiments
    • Quasi-Experimental studies
    • Comparative
    • Associational
    • Descriptive

Internal Validity is equivalence and control

  • Evaluating the internal validity of a study –
    • Equivalence of the groups on participant characteristics
    • Control of extraneous experiences and environmental variables

Establishing equivalence

  • Are groups equivalent prior to introduction of IV?
    • Assured without further work in randomized studies
    • Empirical comparisons for non-randomized studies
    • Can you use matching or statistical adjustments?

Establishing control

  • Extraneous and environmental variables
    • Not of direct interest
    • Influence the outcome
    • Imbalanced
  • Example: contamination
  • Is one group affected more than the other?
    • Less of an issue for laboratory studies

Other threats to internal validity

  • Regression to the mean
  • Dropouts/attrition
  • Bias in assignment
  • Carryover effects
  • Changes in environment
  • Instrument or observer inconsistency
  • Patient expectations
  • Observer bias

Break #1

  • What you have learned
    • Internal validity
  • What’s coming next
    • External validity

External Validity

  • Population external validity
    • Representative sample
    • Few or no restrictions
    • Few or no dropouts
  • Ecological external validity
    • Naturalness
      • Setting
      • Procedures

Distinction between internal and external validity

  • Sampling process influences external validity
    • Random samples versus non-random samples
  • Treatment allocation influences internal validity
    • Randomized design versus non-randomized design

Trade-offs between internal and external validity

  • High degree of control
    • Avoid issues of equivalence
    • Unnatural setting
  • Low degree of control
    • More chances of contamination
    • Closer to how medicine is practiced.

Break #2

  • What you have learned
    • External validity
  • What’s coming next
    • Measurement validity

Measurement quotes (1 of 2)

  • “The government is extremely fond of amassing great quantities of statistics. These are raised to the Nth degree, the cube roots are extracted, and the results are arranged into elaborate and impressive displays. What must be kept ever in mind, however, is that in every case, the figures are first put down by a village watchman, and he puts down anything he damn well pleases.”
    • Sir Josiah Stamp, as quoted on Quotetab.

Measurement quotes (2 of 2)

  • “only scientists are arrogant enough to think that they always observe with rigorous and objective scrutiny”
    • Stephen Jay Gould, The Mismeasure of Man, page 36.

Measurements that warrant closer scrutiny

  • Patient reported outcomes
    • Participant report
  • Researcher evaluations
    • Only when concerned about subjectivity
  • Psychological constructs
  • Composite scores

Measurement Reliability

  • Synoynms: consistency, precision, stability
  • No measurement is perfectly reliable
  • Dependent on the population
  • Look for prior efforts in reliability

Measurement Validity, 1

  • Reliability by itself is not enough.
    • Consistent measures of the “wrong thing” is bad
  • Examples of the wrong thing
    • Measuring anxiety instead of stress
    • Measuring transient changes in a patient’s mood rather than chronic depression

Measurement Validity, 2

  • Validity
    • “Degree to which a measure … measures that which it was intended to measure”
  • Reliability is a pre-requisite for validity
  • Validity is a journey and not a destination

Break #3

  • What you have learned
    • Measurement validity
  • What’s coming next
    • Five case studies

Case study #1 - Neighborhood Environment Survey

Case study #1 - Neighborhod Environment Survey

Case study #2 - Pain scale

Case study #3 - Apgar score

Case study #4 - Boston Bowel Prep Score

Case study #4 - Boston Bowel Prep Score

Excerpt from Lai et al article

Case study #5 - Disgust Scale Revised

Case study #5 - Disgust scale revised

Break #4

  • What you have learned
    • Five case studies
  • What’s coming next
    • Three dichotomies of measurement

First dichotomy: Who is the measurer?

  • Self reported outcomes
    • Also know as patient reported outcomes
  • Researcher evaluations
    • Only when concerned about subjectivity

Second dichotomy: How many pieces?

  • Composite scores
    • Sum or average
  • Single measurement

Third dichotomy: is the measure soft or hard? 1

  • Psychological or social constructs
    • Created and accepted by you and me
    • Impossible to observe directly
    • Examples: stress, anxiety

Third dichotomy: is the measure hard or soft? 2

  • Biological or physical measure
    • Has an objective reality
    • Potential for direct observation
    • Example: obesity, dementia

Break #5

  • What you have learned
    • Three dichotomies of measurement
  • What’s coming next
    • Dichotomy examples

Break #5

  • What have you learned
    • Dichotomies of measurement
  • What’s coming next
    • Dichotomy examples

Examples of self reported outcomes

Neighborhood Environment Survey

Pain scale

Examples of researcher evaluation

Apgar score

Boston Bowel Prep Score

Examples of composite scores

Neighborhood Environment Survey

Apgar score

Examples of single measurements

Pain scale

Boston Bowel Prep Score

Examples of constructs

Apgar score

Neighborhood Environment Survey

Examples of biological or physical measurements

Boston Bowel Prep Score

Pain scale

Pop quiz

  • Is the disgust score
    • a self report or a researcher evaluation?
    • a composite measure or a single measurement?
    • a psychological or social construct or a biological or physical measurement?

Break #6

  • What you have learned
    • Dichotomy examples
  • What’s coming next
    • Split half reliability

Measurement Reliability

  • Synoynms: consistency, precision, stability
  • Classical test theory
    • Observed value = True value + Measurement error
    • This is a purely hypothetical model
  • Reliability coefficient
    • Variance of true values / Variance of measured values
  • Depends on your population

Measurement Reliability

  • No measurement is perfectly reliable
    • Strive for 0.7 or higher in research
    • 0.6 is “borderline”.
    • Might require 0.9 or higher for individual decisions

Indirect measures of the reliability coefficient

  • Test-retest
  • Interrater
  • Internal consistency

Parallel forms

  • “No man ever steps in the same river twice, for it’s not the same river and he’s not the same man.”
    • Heraclitus
  • Used when you can’t run the same measurement twice.
  • How to develop parallel forms
    • Change the question order
    • Minor changes to the wording
  • Difficult to develop two parallel forms of the same measurement.

Split half reliability

  • Only used for composite measurements
  • Split into halves, correlated
    • Odd-even split
    • Random split
  • Brown-Spearman adjustment

Break #7

  • What you have learned
    • Split half reliability
  • What’s coming next
    • KR-20 and Cronbach’s alpha

Kuder-Richardson 20 formula

  • Only for composite measures with binary items
  • Book’s formula is confusing
    • \(S^2\) and \(\sigma^2\) used interchangably
    • Subscripts are missing
  • Correct formula
    • \(X_i=\Sigma_i B_{ij}\)
    • Where \(B_{ij}\) is binary (0 or 1)
    • \(KR-20=\frac{1}{I-1}\big(1-\frac{\Sigma_i \hat{p}_i\hat{q}_i}{S^2}\big)\)
    • where \(S^2=\Sigma_i(X_i-\bar{X})^2\),
    • \(\hat{p}_j=\Sigma_i B_{ij}\),
    • \(\hat{q}_j=\Sigma_i (1-B_{ij})\)

Kuder-Richardson 20 interpretation

  • KR-20
    • \(\frac{1}{I-1}\big(1-\frac{\Sigma_i \hat{p}_i\hat{q}_i}{S^2}\big)\)
    • \(\Sigma_i \hat{p}_i\hat{q}_i\) is a theoretical minimum variation
    • \(S^2\) is observed variation
    • \(S^2 = \Sigma_i \hat{p}_i\hat{q}_i\) implies randomness
    • \(S^2 > \Sigma_i \hat{p}_i\hat{q}_i\) implies internal consistency

Cronbach’s alpha, formula

  • Used for composite measurements with continuous items
  • Book’s formula is confusing
    • \(\Sigma S^2\) should be \(\Sigma S_i^2\)
  • Correct formula
    • \(X_i=\Sigma_j X_{ij}\)
    • \(\alpha=\frac{1}{I-1}\Big(1-\frac{\Sigma_i S_i^2}{S^2}\Big)\)
    • where \(S_i^2=\Sigma_j(X_{ij}-\bar{X}_j)^2\), and
    • \(S^2=\Sigma_i(X_i-\bar{X})^2\), and

Cronbach’s alpha, interpretation

  • Cronbach’s \(\alpha\)
    • \(\frac{1}{I-1}\Big(1-\frac{\Sigma_i S_i^2}{S^2}\Big)\)
    • \(\Sigma_i S_i^2\) is a theoretical minimum variation
    • \(S^2\) is observed variation
    • \(S^2 = \Sigma_i S_i^2\) implies randomness
    • \(S^2 > \Sigma_i S_i^2\) implies internal consistency
  • Cronbach’s alpha is NOT a measure of unidimensionality

Break #8

  • What you have learned
    • KR-20 and Cronbach’s alpha
  • What’s coming next
    • Test-retest and interrater reliability

Test-retest reliability

Test-retest reliability (also called repeatability)

  • Correlation of two measurements separated by time
  • Length of time interval is critical
    • No carry-over
    • No changes in the true score
  • Useful for composite scores and single values
  • Useful for self-report and researcher evaluation
  • Not possible for some measures

Inter-rater reliability

Inter-rater reliability

  • Used for researcher evaluations only
  • Simplest case
    • Two independent raters
    • Ratings for every patient
  • Analysis
    • Intraclass correlation or Cohen’s Kappa
  • Extensions

Practical guidance on reliability

  • Is there previous literature?
    • Report their reliability coefficients
  • Is your setting similar?
    • Different demographics?
    • Different cultural norms?
    • Different literacy?
    • Different language?
  • Compare to reliability in your sample
    • Test-retest and inter-rater reliability preferred.
    • 0.7 or higher

Break #9

  • What you have learned
    • Test-retest and interrater reliability
  • What’s coming next
    • Face and content validity

Types of measurement validity

  • Face validity
  • Content validity
  • Response process evidence
  • Criterion validity
  • Construct validity

Face validity and content validity

  • Face validity
    • Opinions from your patients
    • Subjective and unquantifiable
  • Only used for composite measures
  • Content validity
    • Opinions from outside experts
    • Subjective and unquantifiable
  • Only used for composite measures

Question.

  • Should Statisticians work on problems that are subjective and unquantifiable?

Answer.

  • Yer darn tootin!

Response process evidence

  • Observe the process
    • Watch as patients fill out the form
    • Ask questions along the way
    • Monitor response times
    • Encourage them to think aloud
  • Supplement with interview
  • Goal is to identify problematic elements
    • Confusion, misunderstandings, language issues

Break #10

  • What you have learned
    • Face and content validity
  • What’s coming next
    • Criterion and construct validity

Criterion validity

  • Comparison to external criterion
    • Represents “truth”
    • Not always available
  • Predictive evidence
    • Measurement in the future
    • Be careful about dropouts
  • Concurrent evidence
    • Measured at the same time

Construct validity

  • Used for a psychological construct
  • No direct measure of the truth exists
  • Define associations consistent with your constuct
    • Does your measurement show the expected association?
    • Known as convergent evidence
  • Define non-associations with your construct
    • Does your measurement also show non-association?
    • Known as discriminant or divergent evidence

Alternative framework for validity

  • Content
  • Response processes
  • Internal structure
  • Relations to other variables
  • Consequences

Validity of diagnostic tests

  • Sensitivity
    • A test’s ability to obtain a positive result when the target condition is really present
  • Specificity
    • A test’s ability to obtain a negative result when the target condition is really absent

Break #11

  • What you have learned
    • Criterion and construct validity
  • What’s coming next
    • Case studies revisited

Case study #1

Case study #1 - Neighborhood environment survey

Case study #1 - Neighborhood environment survey

  • Reliability - What you can’t do
    • Inter-rater reliability
  • Reliability - What you can do
    • Test-retest reliability
    • Cronbach’s alpha

Case study #1 - Neighborhood environment survey

  • Validity - What you can’t do
    • Criterion validity
  • Validity - What you can do
    • Face/content validity
    • Response process validity
    • Factor analysis
    • Construct validity

Case study #2

Case study #2 - Pain scale

Case study #2 - Pain scale

  • Reliability - What you can’t do
    • Inter-rater reliability
    • Cronbach’s alpha
  • Reliability - What you can do
    • Test-retest reliability

Case study #2 - Pain scale

  • Validity - What you can’t do
    • Criterion validity
    • Face/content validity
    • Response process validity
    • Factor analysis
  • Validity - What you can do
    • Construct validity

Case study #3

Case study #3 - Apgar score

Case study #3 - Apgar score

  • Reliability - What you can’t do
    • Test-retest reliability
  • Reliability - What you can do
    • Inter-rater reliability
    • Cronbach’s alpha

Case study #3 - Apgar score

  • Validity - What you can’t do
    • Criterion validity
  • Validity - What you can do
    • Face/content validity
    • Response process validity
    • Construct validity

Case study #4

Case study #4 - Boston Bowel Prep Score

Case study #4 - Boston Bowel Prep Score

  • Reliability - What you can’t do
    • Test-retest reliability
    • Cronbach’s alpha
  • Reliability - What you can do
    • Inter-rater reliability

Case study #4 - Boston Bowel Prep Score

  • Validity - What you can’t do
    • Face/content validity
    • Response process validity
    • Construct validity
  • Validity - What you can do
    • Criterion validity

Case study #5 - Disgust Scale Revised

Case study #5

  • What do you think?
  • What measures of reliability?
    • Test-retest reliability
    • Inter-rater reliability
    • Measures of internal consistency
  • What measures of validity?
    • Face/content validity
    • Response process validity
    • Criterion validity
    • Construct validity

Conclusion

  • What you’ve seen today
    • Internal validity
    • External validity
    • Measurement reliability
    • Measurement validity
    • Three dichotomies of measurement
    • Five case studies